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Introduction 


With every new and emerging infectious patho- 
gen, particularly those that are capable of causing 
widespread debilitating illness and death, it is 
necessary not only to institute local, regional and 
international public health care measures to prevent 
and contain the infections, but also to rapidly 
develop therapeutic strategies to elicit protective 
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scFv, single chain variable fragment; ST, statistical 
threshold; RBD, receptor binding domain. 
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Rapid elucidation of neutralizing antibody epitopes on emerging viral 
pathogens like severe acute respiratory syndrome (SARS) coronavirus 
(CoV) or highly pathogenic avian influenza H5N1 virus is of great 
importance for rational design of vaccines against these viruses. Here we 
combined screening of phage display random peptide libraries with a 
unique computer algorithm “Mapitope” to identify the discontinuous 
epitope of 80R, a potent neutralizing human anti-SARS monoclonal 
antibody against the spike protein. Using two different types of random 
peptide libraries which display cysteine-constrained loops or linear 13—15- 
mer peptides, independent panels containing 42 and 18 peptides were 
isolated, respectively. These peptides, which had no apparent homologous 
motif within or between the peptide pools and spike protein, were 
deconvoluted into amino acid pairs (AAPs) by Mapitope and the 
statistically significant pairs (SSPs) were defined. Mapitope analysis of 
the peptides was first performed on a theoretical model of the spike and 
later on the genuine crystal structure. Three clusters (A, B and C) were 
predicted on both structures with remarkable overlap. Cluster A ranked 
the highest in the algorithm in both models and coincided well with the 
sites of spike protein that are in contact with the receptor, consistent with 
the observation that 80R functions as a potent entry inhibitor. This study 
demonstrates that by using this novel strategy one can rapidly predict and 
identify a neutralizing antibody epitope, even in the absence of the crystal 
structure of its target protein. 

© 2006 Elsevier Ltd. All rights reserved. 
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host immunity. In the case of respiratory illnesses 
such as severe acute respiratory syndrome (SARS), 
highly pathogenic H5N1 avian influenza and West 
Nile Virus febrile illness/encephalitis, where the 
importance of neutralizing antibodies in preventing 
disease onset is clearly established, defining the 
molecular determinants of the neutralizing epi- 
tope(s) is critically important in the development of 
an efficacious vaccine.” In particular, recombinant 
vaccines that are capable of focusing the humoral 
immune response on neutralizing epitopes can be 
predicted to be most beneficial and may provide a 
more rapid way to respond to emerging biothreats 
than traditional attenuated or inactivated viruses or 
subunit vaccines. 

SARS emerged as a new infectious disease and 
caused a serious worldwide outbreak in 2002 to 
2003 with over 8000 individuals becoming infected. 
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Prediction of a Neutralizing Epitope for SARS 


In its most severe form, infection with the novel 
SARS-coronavirus (SARS-CoV) was associated with 
progressive pneumonia, respiratory failure, and a 
fatality rate of ca 10%.°""' The receptor for SARS- 
CoV was shortly thereafter identified as angio- 
tensin-converting enzyme-2 (ACE-2)'*'* and 
importance of neutralizing antibodies to the 
SARS-CoV spike protein in preventing infection 
in vitro and in vivo was established.’ However, 
serologic studies from both late outbreak infected 
humans and with serum from mice immunized 
with a late outbreak strain demonstrated the 
presence of antibodies that were able to enhance 
infection of SARS-like CoV from civet cats in a 
pseudo-virus reporter assay.'* Since these enhan- 
cing mouse antibodies map to the receptor binding 
domain (RBD) of Spike (S) protein, a region that 
would obviously be used in a subunit vaccine, it 
appears that some epitopes contained therein may 
be detrimental and thus defining the precise nature 
of the neutralizing epitope(s) is warranted. There- 
fore, a vaccine should focus on eliciting only 
neutralizing antibodies and not antibodies that are 
either non-neutralizing or enhancing in nature.” 

We took the first steps toward the goal of 
identifying the major neutralizing epitope of 
SARS-CoV as a model of neutralizing epitope 
identification using a reverse immunological 
approach. In order to accomplish this task one 
must backtrack from the antibody of interest to its 
corresponding neutralizing epitope.’® It is then 
assumed that, once identified, the epitope can be 
reconstituted and stabilized with the intent that 
when administered as a vaccine it will elicit the 
neutralizing activity characteristic of the original 
monoclonal antibody (mAb). The human recombi- 
nant mAb used in this study, named 80R was 
isolated from a phage display library after panning 
against the 51 domain of the SARS-CoV Spike 
protein.’ 80R binds to the RBD, a 193 amino acid 
fragment (residues 318 to 510) of spike protein with 
high-affinity (Kqg=1.7 nM) and is a potent neutraliz- 
ing mAb in vitro and in vivo.'” It acts as a viral entry 
inhibitor through blocking the association of 5S 
protein to its receptor ACE2. Mutagenesis studies 
further support this conclusion as Spike determi- 
nants involved in the binding of receptor and of 80R 
are in part overlapping and are likely to result from 
both common and unique contact residues.’ 


Results 
The principles of the Mapitope algorithm 


A unique computer algorithm Mapitope enabled 
us to map epitopes on spike protein using peptides 
that bind to 80R. Mapitope is an updated user- 
friendly version of the algorithm previously pub- 
lished by Enshell-Seijffers et al.'° The prediction of 
an epitope is based on the notion that the panel of 
peptides derived from a random peptide library 
collectively represents the epitope of the mAb 
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which they bind. The underlying principle of 
Mapitope is that the simplest meaningful fragment 
of an epitope is an amino acid pair (AAP) of 
residues that lie within the footprint of the epitope. 
These AAPs can be related to one another on the 
surface of the antigen such that a cluster is defined 
which constitutes the majority of the epitope 
footprint, i.e. the epitope is in essence a cluster of 
connected AAPs. The AAPs of the epitope need not 
be consecutive tandem residues of the antigen, but 
often are the result of juxtaposition of distant 
residues brought together through folding of the 
polypeptide chain, the distance between their 
carbon alphas (parameter D), defines what consti- 
tutes a legitimate pair. AAPs of the epitope are 
simulated by tandem residues of the peptides, 
affinity selected from the random library. Each 
peptide is assumed to contain one or more epitope 
relevant AAPs, which is the basis for mAb 
recognition of that peptide. In order to identify the 
statistically significant pairs (SSPs) present in the 
panel of peptides, the peptides are first deconvo- 
luted into AAPs. Thus, for example to deconvolute 
a peptide into AAPs, a peptide of the sequence 
ABCDE... would be written as the series of pairs: 
AB, BC, CD, DE, etc. All the AAPs derived from the 
panel of peptides are then pooled and the frequency 
of each type is calculated. It is next determined 
whether the AAPs representation in the pool is 
higher than the random expectation and if so, these 
pairs are considered to be SSPs. A second parameter 
of the algorithm (the first being D) is the frequency 
of a specific pair ina given pool of AAPs derived from 
the panel of peptides. The number of standard 
deviations above randomness for a given pair is 
defined as the statistical threshold (ST). Once the most 
frequent AAPs are identified, the algorithm seeks the 
pairs for a selected D value on the surface of the 
antigen and attempts to link them into clusters. 
A third parameter of the algorithm is E, the surface 
accessibility threshold. E defines those residues that 
are sufficiently exposed on the antigen’s surface to be 
included in the predicted epitope. The accessibility of 
each amino acid is automatically calculated using the 
software “SurfRace,”!® which has been assimilated in 
the algorithm software. In this study the SSPs which 
were mapped on the 3-D structure of the antigen 
contained residues that are at least 5% exposed 
(E=5); however, impact of the E parameter was 
examined as well (see below). 

As contacts between the mAb and the antigen are 
mostly through functional moieties of the R-groups, 
conserved residues were consolidated into 13 
functional subgroups of amino acids and given 
single-letter notations: 


B=R,K; J=E,D; O =S5,T; U =L,V, I; 
X =Q,N; Z=W,F; A=A; C=C; G=G; 
H=H; M=M; Y=YY. 


In summary, a mAb is used to screen a random 
peptide library to generate a panel of peptides 
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recognized by the mAb. These peptides are 
deconvoluted into AAPs and the SSPs are ident- 
ified. These are then mapped in the crystal structure 
of the antigen and the most elaborate and diverse 
clusters on the surface of the antigen are identified. 
These are regarded as the predicted epitope 
candidates. 


Phage display peptide panning against 80R scFv 


A variety of combinatorial phage display peptide 
libraries were screened with the 80R single chain 
variable fragment (scFv) (see Materials and 
Methods). Two independent panels of peptides 
were isolated (Table 1). The peptides were derived 
from two different types of random peptide 
libraries, 42 peptides derived from cysteine con- 
strained-loop libraries were designated as panel 1 


Table 1. Peptide data sets for Mapitope prediction of the 
80R epitope 


Panel 1 (42 peptides) Panel 2 (18 peptides) 


RSGGCVGGQYCLTPTH 
NDWPCLSHTTVCNGTQ 
ATMPCLSHPSVCKHLY 
PMHECLSAPSVCADNY 
TELACLSEAYICDRSN 
ETFTCISAPWTCVTWL 
EKMACLSTLDVCMENP 
NNMSCLSHETICGRNP 
LPFECTSKREVCDIPM 


LDSMHFPFHSRSFWP 
NLSCTHPLGSPPPAP 
GQICYYGRDAYLCFL 
CESSLCLMYSLGPPA 
OTPPCPIEHCPSFYQ 
OSTCLSHPLLCLSWN 
PNCWVGLTGAHSCFL 
THSVPVAYPWPDLNA 
SPLDYECISHATVCF 


SVDDCRWNLNCEPPP 
SEVYCPRPDRCLRAP 
VORDCRWTFSCATLI 
TPPRCSDOQMYCSLSR 
THOFCPDPKHCLAQP 
RMPPCMNAGECPTIA 
DTPDCXGNEKCLEYA 
TSNFCPAGGPCSPHG 
NPRVCMNKWECEQAI 
GPPLGCLSLSCYDVA 
WNDYCTMNOCDTHN 
KPLHCGDTFCSLNQ 
YLEHCTMNECLNAR 
NGYHCLSEFCMPHP 
SMEECRLWLCPPYE 
YKPWCEMNKCKPLA 
VMPECLSRLCDFDM 
DDMPGCY PMCTLNK 
YDSYCIMNFCGHAA 
YTAADCPGLLYLCP 
NDVRCKLWLCPMPD 
NNWPCLNETCPTKG 
VQWPCLSKOCNDNI 
YQADCLMNRCPTAE 
SAPECHLYYCPEQA 
ANPVCRLWMCPPIV 
ROTEPCNLWFCPOV 
REPPCVOVHCSTAK 
PKEQPWSEFRPAGM 
ADCTLWFCPOTSN 
CLSATCDCTLCGP 
FPELTCWTCLASS 
PPAYSCLCPWAHM 


YSTPSSLLDTHPLYK 
TLPPPCLSSPSRCVN 
RIMAPSDEPLPLGMP 
GTGLVPLFDPRYRFL 
SSSROEPYPLYPLES 
HPKVGEGIDFTSIVP 
ATDLLAAYPLYSPSL 
VVPLGRCVSHPAICA 
GFPCLSVASACYGIT 


Panel 1, peptides isolated with the 80R from phage display 
peptide libraries where cysteine residues are fixed. The pre-fixed 
cysteine residues are indicated in bold. Panel 2, peptides isolated 
from linear peptide libraries. 
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and 18 peptides, derived from libraries of random 
linear peptides, were designated as panel 2. No 
common homologous motif was observed within 
the peptides themselves, or between the peptides 
and the SARS-CoV spike protein. This is not 
surprising in view of the fact that the epitope of 
80R is conformational.’ Each set of peptides was 
used independently for Mapitope analysis, thus 
generating two independent predictions of the 80R 
epitope. 


Analyzing the peptides and defining statistically 
significant pairs (SSPs) 


The first step in applying the algorithm is to 
“translate” the peptides into Mapitope functional 
notations (see above) and to deconvolute them into 
AAPs. Deconvolution of peptides into AAPs using 
the functional notation allows for 13 classes of 
amino acids and therefore 169 possibilities. How- 
ever, as 13 pairs are homodimers (e.g. AA, BB, etc.) 
the total number of different AAPs possible is 156. 
Deconvolution of the 42 peptides of panel 1 
produced a total of 568 AAPs which are represented 
by 133 different pair types. Taking ST>3, a total of 
11 pair types were found to be statistically 
significant pairs (SSPs). These 11 pair types (8% of 
all available 133 pair types) were represented by 108 
pairs (19% of all the 568 pairs). Similarly, deconvo- 
lution of the 18 peptides of panel 2 produced a total 
of 252 AAPs represented by 89 different pair types. 
Taking ST >3, a total of 12 pair types were found to 
be SSPs. These 12 pair types (13% of all available 89 
pair types) were represented by 60 pairs (24% of all 
the 252 pairs). 

The Mapitope predictions are based on focusing 
on those pairs that are statistically enriched. 
Figure 1(a) gives the 11 SSPs of panel 1 comparing 
the observed occurrence with the calculated 
expected occurrence based on total randomness. 
Note that in Figure 1(b) the highest value for 
occurrence does not necessarily promise the great- 
est statistical significance, since the statistical 
significance depends on the individual expectation 
of each SSP (for more explanation about random 
expectation of SSPs and factors that can influence 
this parameter see Enshell-Seijffers et al.'°). 
Compare for example, the SSPs CU versus YC; CU 
appears 26 times in the peptides, which is five 
standard deviations greater than its expectation in 
the library (in a panel of 42 totally random peptides, 
CU is expected to appear 18.1 times). On the other 
hand, the SSP YC appears only six times, but is two 
times more abundant than would be expected; 
consequently its ST value is 4.76. An extreme case is 
the pair CJ which exists eight times in the peptides; 
however, its expected occurrence is 9.05 and there- 
fore this pair is actually under-represented (not 
shown). Similarly, analysis of the 18 peptides of 
panel 2 is shown in Figure 1(c) and (d). Of the 12 
pairs which are defined as SSP (ST=3) the most 
significant pairs are PU, CU and PP. 
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(a) PANEL 1 - 42 peptides 


Hk Tie RT i 


CU CP JC PP PJ MK XJ YC XP HC PM 
(b) PANEL 1 - 42 peptides 
12 
10 
B 8 
2 
S 6 
- 
4 
2 
0 
CU CP JC PP PJ MX XJ YC XP HC PM 


(C) PANEL 2- 18 peptides 


PU CU HP PP OH YP CZ ZP AY PC CY MH 


(d) PANEL 2 - 18 peptides 


ST values 


PU CU HP PP OH YP CZ ZP AY PC CY MH 


Figure 1. Computation of the SSPs derived from the 
80R binding peptides ((a) and (c) for panel 1 and panel 2, 
respectively) and their comparison between the observed 
occurrence (gray bars) and calculated expected occur- 
rence (open bars). The error-bars represent statistical 
threshold (ST) value equals 3. Histograms (b) and (d) 
show the significance of each pair (ST values) based on 
the peptides of panel 1 and panel 2, respectively. 


Preliminary prediction on the RBD of spike 
protein 


Once the analysis of the peptides was preformed 
and the most significant amino acid pairs were 
identified, the next step is to map these pairs on the 
surface of the SARS-CoV spike protein. The most 
desirable starting point for this would be to use a 
solved atomic structure of the antibody’s antigen, in 
this case, the receptor binding domain (RBD), but 
such a solved structure was not available when this 
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study initiated. Nonetheless, an alternative Mapi- 
tope prediction was conducted using a theoretical 
model of the spike, which was obtained by 
homology modeling between the SARS-CoV spike 
and the botulinum neurotoxin B.'” The 3-D 
structure of botulinum neurotoxin B served as a 
template for the prediction of the 3-D structure of 
the SARS-CoV spike.’” As previous studies of 80R 
have indicated that its epitope is contained within 
the RBD of the spike, our prediction was focused on 
this aspect of the modeled spike protein. Appli- 
cation of Mapitope entails a preliminary run of a 
given data set of peptides using the default 
parameters (ST=3, D=9 A, E=5%). Such a pro- 
cedure generates a first approximation of possible 
epitope candidates, i.e. “clusters”. The analysis of 
Table 1 panel 1 gave three possible clusters 
designated as clusters A, B and C (Table 2). The 
analysis of Table 1 panel 2 gave the same three 
clusters with an addition of a fourth cluster 
designated cluster D (Table 2). Therefore, at this 
point each cluster was analyzed independently. 


Defining the limits of each cluster: modifying 
the D parameter 


The question that arises is how can one rank the 
clusters and identify which is a better candidate of 
the epitope as compared to the others? For this, once 
a set of preliminary clusters is identified, the next 
step is to evaluate the behavior of each cluster, 
taking different D values ranging from 4 to 15 (the 
distance of carbon a to carbon @ for tandem residues 
(n, n+1) is 3-6 A). Maintaining ST=3, the number 
of amino acids for each cluster was measured as a 
function of distance between two amino acids 
comprising a pair. As an example, Figure 2 illus- 
trates the effect of distance on the four clusters of 
panel 2. Figure 2(a) shows the change in the number 
of amino acids in clusters A and C and Figure 2(b) 
shows the same for clusters B and D. Note that as a 
function of increasing the D value the number of 
amino acids increases, as expected. However, 
beyond a given point this increase gives a 
“quantum jump” in the number of amino acids 
associated with a given cluster, this is defined as the 
“Q point” (indicated by the gray arrows). The 
significant increase in the number of amino acids 
beyond the Q point could be the result of merging of 
adjacent clusters or recruitment of peripheral or 
underlying irrelevant amino acids thus leading to a 
sharp increase in the number of amino acids 
associated with a given D value. For example, for 
cluster A the jump is at 12 A, going from 11 amino 
acid residues to 31, for cluster D the Q point is at 
13.5 A (from 30 to 55 amino acid residues in the 
cluster; see Figure 2(a)). The Q points for clusters A, 
B and C in the first panel (42 peptides) and for 
clusters A—D in the second panel (18 peptides) are 
shown in Figure 2(c). 

Cluster D is not predicted in the analysis of 
panel 1 peptides. Moreover, as can be seen in 
Figure 2(b), it is based exclusively on pairs which 
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Table 2. Amino acids predicted in each cluster A, B and C for panel 1 and panel 2 peptides using the theoretical model 


A B C 
Panel 1 Panel 2 Panel 1 Panel 2 Panel 1 Panel 2 
Pro450 Phe334 
Glu452 Asn318 Pro335 
Asp454 Tle319 Tle319 Val337 
Asn457 Asn321 Thr320 Tyr338 
Pro459 Leu322 Leu322 Ala339 
Pro462 Cys323 Cys323 Ala350 
Asp463 Pro324 Pro324 Tyr352 
Pro466 Pro466 Phe325 
Cys467 Cys467 Phe361 
Pro469 Pro469 Glu327 Glu341 Phe364 
Pro470 Pro470 Val328 Val328 Cys366 Cys366 
Leu472 Leu472 Asn330 Tyr367 Ty1r367 
Asn473 Thr332 Val369 Val369 
Cys474 Cys474 Ala371 
Tyr475 Tyr475 Tyr440 Tyr440 

‘Trp476 Tyr442 Tyr442 Leu374 Leu374 

Pro477 Pro477 Leu443 Leu443 Asn375 

Leu478 His445 His445 Asp376 
Asn479 Leu377 Leu377 
Asp480 Cys378 Cys378 
Tyr481 Phe379 

Asn381 
Val382 Val382 
Tyr383 Tyr383 
Ala384 

Asp385 


The prediction for each peptide panel and cluster was made at the respective Q point (see Figure 2(c)) and at ST=3. Amino acids 


common to both panels are in bold. 


are separated by at least 8.5 A. This would be an 
unusual situation as it indicates that none of the 
pairs in this cluster are tandem in the linear 
sequence. Therefore, we consider cluster D as least 
likely to be the epitope of 80R. Figure 3 shows 
clusters A, B and C as predicted by Mapitope using 
panel 1 and panel 2 peptides. Table 2 summarizes 
the amino acids included for the three clusters A, B 
and C which are predicted at their respective Q 
points using ST=3 for each panel of peptides. 
Amino acids common to both panels are in bold. 
Table 3 shows the SSPs comprising each cluster 
and their significance according to the calculations 


number of amino acids 
number of amino acids 


@ 3 5 7 9 1 43 
distance (A) 


ane 1 — 42 peptides 


distance (A) 


ane 2 — 18 peptides 


that were made in Figure 1. Note that clusters A and 
B are the most varied as they contain the larger 
amount of different SSPs and use the SSPs with the 
highest significance (e.g. the highly significant pair 
CP in panel 1, or the SSPs HP, PP, OC and PC that 
are used by clusters A and B but missing from 
cluster C in panel 2). 


Mapitope analysis based on the crystal structure 
of the RBD of spike protein 


During the course of this study, Li et al. solved the 
atomic structure of the RBD of the SARS-CoV with 


—A— cluster B 


—m— cluster D 


Figure 2. The effect of distance 
between amino acids comprising a 
pair on the number of amino acids 
within a cluster in the analysis of 
the peptides of panel 2 applied to 
the theoretical model of the SARS- 
CoV spike. (a) Clusters A and C; 
(b) clusters B and D. The arrows 
indicate the Q points. (c) The table 
summarizes the Q points for the 
three clusters of panel 1 (data not 
shown) and for the four clusters of 
panel 2. All the predictions were 
conducted at ST=3. 
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Figure 3. RasTop spacefill presentation of clusters A (red), B (green) and C (yellow) as predicted from Mapitope 
analyses of panel 1 peptides (left panel) or panel 2 peptides (right panel) using the theoretical model of the spike RBD. 


Amino acids comprising each cluster are listed in Table 2. 


its receptor ACE2.*° This allowed us to repeat the 
Mapitope analysis; however, this time using the 
genuine atomic coordinates. Once this was com- 
pleted, we were able to compare the two sets of 
predictions, and thereby gain insight as to the utility 
of Mapitope prediction using theoretical models, 
for future studies where crystal structures have not 
been solved. In order to compare the two structures, 
we employed the FlexProt program,”’ which is 
capable of detecting hinge regions and structurally 
aligning the rigid subparts of two 3-D structures 
(pair-wise alignment). In the comparison of the two 
RBD structures, residues 323-498, we found about 
50% correspondence (89 matches out of 174 amino 
acid residues; RMSD=2.79 A). This indicates that 
there is a general similarity between the genuine 
structure and the theoretical model used above. 
As before, we used the SSPs of both peptide 
panels to perform Mapitope predictions on the 
crystal structure of the spike using the default 
parameters. Much to our satisfaction clusters A, B 
and C described above were partially predicted 
anew (at least 50% overlap with the clusters 
predicted using the theoretical model) but this 
time using the atomic coordinates of the crystal 


structure (this corresponds well with the FlexProt 
analysis described above). As is illustrated in 
Figure 4 the three clusters are easily identified at 
ST=3. In this case a fourth cluster is also defined 
(designated as cluster D) as distinct for the panel 1 
peptides, which merges with cluster C in the case of 
panel 2. Increasing the ST value to five eliminates 
clusters C and D or diminishes cluster C 
markedly using panel 1 and panel 2, respectively 
(not shown). 

Identification of the Q point for each cluster and 
its effect on the predictions are shown in Figure 5. 
Clusters B and C have a Q point=10.5 A, above 
which the two clusters merge into one. In contrast to 
this, the prediction of cluster A is far more robust 
and tolerates D values as high as 12.5 before 
reaching a Q point. This distinguishes this cluster 
as compared to the other two. 

Considering the usage of SSPs and their ST 
values, here cluster A ranks the highest as is 
illustrated in Table 4. The amino acid residues 
included in the clusters using the crystal structure 
are listed in Table 5. In summary, cluster A stands 
out as being the most attractive potential candidate 
for the 80R epitope. 


Table 3. The number and the quantity of the SSPs used by each cluster as predicted on the surface of the theoretical 


model of the 193 amino acid segment of the spike 


Pair CU CP JC PP Py 
Cluster ST 515 1015 550 595 4.34 
A — + + - - 
B — + + -- 
C + — 

Pair PU CU HP PP OH 
Cluster ST 5.29 5.38 785 5.06 4.29 
A — + ~ 
B — + + -- - 
C + _ 


MX Jx YC XP HC PM 
700 3.00 4.76 355 3155 3.55 
- -- —- 
— ~ —- —- 
—- 
YP CZ ZP AY PC CY MH 


5.00 5.00 3.04 3.57 B07 5.00 200 


- -- - — 
_ _- ~ —- - 
- ~ -- _ 


The table on the top shows panel 1 clusters and the bottom table shows panel 2 clusters. The ST values for each SSP are given (only those 


SSPs which have ST values greater than 3 are shown). 
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Figure 4. Left and right panels: RasTop representation 
of clusters A (red), B (green), C (yellow) and D (cyan) on 
the crystal structure of the SARS CoV S protein RDB in the 
analysis of panel 1 or panel 2, respectively. 


In Figure 6 the cluster A (colored in red) and the 
common amino acid residues (colored in yellow) 
predicted by both the theoretical model and 
genuine structure of the Spike RBD are shown in 
the crystal structure of the complex of the SARS- 
CoV S protein RBD and receptor ACE2.”° The 
compactness of the genuine structure is obvious 
and here cluster A becomes a tight protrusion 
comprised of three segments. Residues 455-463 
form an ascending strand that then crosses over as a 
traversing segment (residues 463-472) followed by 


—®—cluster A 
—#@—cluster B 


number of amino acids 
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the descending segment (residues 473-476). The 
distance maintained by five hydrogen bonds 
between the ascending and descending segments 
is about 5 A, which is shorter than the limits of the 
traversing segment (13.4 A). This therefore imposes 
a force flipping the traversing segment forward 
(viewing the ascending segment on your right). The 
orientation and position of this segment is stabil- 
ized by the disulfide between Cys467 and Cys474 
and a series of nine hydrogen bonds cross-linking 
the top of the structure within itself and to the 
ascending and descending segments. 

In view of this compact and stable structure, the 
Mapitope prediction of cluster A gains a robustness 
that is lacking for clusters B and C. This is 
particularly noticeable considering the impact of 
the D parameter on the predictions (see Figure 5). In 
the case of the theoretical model, the Q point for 
cluster A is 12 A where a sharp increase from 11 to 
31 amino acid residues occurs. The Q point for 
cluster A in the crystal structure shifts to 13.5 A, 
where the increase is from 18 amino acid resdues to 
over 60! This illustrates that the prediction is 
basically constant and that the structure of cluster 
A is relatively unchanged throughout the range of 
D values of 6A to 13 A. 

Finally in view of the fact that the mechanism of 
neutralization by mAb 80R has been proposed to be 
interference of viral association with its receptor, 
one cannot escape the fact that in the co-crystal, 
cluster A overlaps with a critical segment of the 
Spike:RBD interface. Several amino acids that lie 
within or juxtaposed to this predicted epitope effect 
spike protein structure globally (e.g. C464, C474), 
others effect Spike: ACE-2 and Spike:80R specifically 


distance (A) 


Figure 5. The impact of the distance (parameter D) on the number of amino acids within a cluster. Panel 2 peptides 
were used for Mapitope prediction on the SARS-CoV 5S crystalline structure. The images are RaslTop space-fill 
representations of the spike protein RBD indicating the three clusters; A (red), B (green) and C (yellow) at different D 
values (left image, 6 A; middle, 10 A; right, 12 A). All the predictions were conducted at ST=3. 


Prediction of a Neutralizing Epitope for SARS 


197 


Table 4. The number and the quantity of the SSPs used by each cluster as predicted on the surface of the crystalline 


structure of the SARS-CoV RBD spike (D=9 A) 


Pair CU CP JC PP 

Cluster ST 5.156 10.15 5.50 5.95 
A aa + af 
B = = - 
C oe 
D of 

Pair PU CU HP PP OH 
Cluster ST 5.29 5.38 7.85 5.06 4.29 
A + f = = 
B ae ale 
C se t af 


Py MX Jx 4s XP HC PM 
4.34 7.00 3.00 4.76 3.55 3.155 3.55 
+ + + 
+ + 
+ 
+ + + 
YP CZ ZP AY PC CY MH 
5.00 500 304 357 357 3.53 3.53 
+ + + + + + 
+ + + + + 
+ + + + + 


The table on the top shows panel 1 clusters and the bottom table shows panel 2 clusters. The ST values for each SSP are given (only those 


SSPs which have ST values greater than 3 are shown). 


(e.g. E452, D454).** In addition, a critical amino acid 
in the predicted epitope has been shown to be 
specifically involved in Spike:80R molecular inter- 
actions (D480)"” while another amino acid, L472, 
had no effect.'” Nevertheless, one can see in Figure 6 
how antibodies to the predicted epitope would 
interfere with Spike:ACE-2 interactions. 


Discussion 


The Mapitope algorithm was developed for the 
localization of B-cell epitopes based on the analysis 
of phage displayed affinity purified peptides."® 
Validation of the algorithm has been achieved by 
first determining the defining parameters using the 


17b:HIV gp120 co-crystal as a known control 
model.** Subsequently, the algorithm was shown 
to be efficient in predicting the epitope of the anti- 
HIVp24 mAb 13b5 also co-crystallized with its 
antigen (HIVp24).'° In a third co-crystal model, a 
published panel of 27 phage displayed peptides 
specific for the Bo2C11 mAb that binds factor VII** 
were used as input with the atomic structure of its 
antigen (factor VIII) taken from the co-crystal 
published by Spiegel et al.*? The Mapitope algor- 
ithm predicted two clusters, the major one (cluster 
B) coincided with the genuine epitope (E. Bublil, 
personal communication). The strategy of using 
multiple independent peptide data sets has also 
been tested using the Trastuzumab (Herceptin®) 
mAb which was co-crystallized with its corre- 


Table 5. Amino acids predicted in each cluster; A, B and C for panel 1 and panel 2 peptides as predicted using the 


genuine coordinates of the spike RBD 


A B C 
Panel 1 Panel 2 Panel 1 Panel 2 Panel 1° Panel 2 

Tplabsyaballs Cys323 Cys323 Phe3 64 
Asn457 Pro324 Pro324 Cys366 Cys366 
val458 val458 Phe325 Tyr367 Tyr367 
Pro459 Pro459 Giusy Val369 Val369 
Phe460 Val328 Ala398 

Pro462 Pro462 I1e345 I1e345 Cys419 

Asp463 Cys348 Cys348 G1n396 
Pro466 Pro466 Val349 Val349 Pro399 Pro399 

Cys467 Cys467 Ala350 G1n401 
Pro469 Pro469 Asp351 Pro413 Pro413 
Pro470 Pro470 Tyr352 Tyr352 Asp414 Phe416 

Ala471 Ala371 Asp415 

Leu472 Leu472 Met417 
Asn473 Cys419 
Cys474 Cys474 Leu448 
Tyr475 Tyr475 Pro450 Pro450 
Trp4/6 Phe451 

G1lu452 
Leu499 Leu499 
Phe501 


Amino acids common to both panels are in bold. Amino acids that were predicted in the analysis of the theoretical model are 
highlighted in gray. The analysis was conducted when D=9 A and ST=3. 
* The amino acids of cluster D are included in this list as well (see the text). 
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Figure 6. Presentation of the cluster A and the common 
amino acids predicted by both the theoretical model and 
genuine structure of the spike RBD in the crystal structure 
of the complex of the SARS-CoV S protein RBD and 
receptor ACE2.*° The spike RBD is shown in cornflower 
blue. The ACE2 is presented in green. Cluster A (residues 
450-480 of the spike) are colored red. The predicted 
common residues (highlighted in Table 5) are colored in 
yellow. 


sponding antigen (the cellular receptor Her-2/neu). 
In this case all three segments of the bona fide 
epitope were correctly predicted when two peptide 
panels were used for Mapitope data bases 
(unpublished results). Further validation of Mapi- 
tope has been published by Enshell-Seijffers et al."° 
in the analysis of the murine mAb CGI10 (an 
antibody specific for the HIV gp120-CD4 complex) 
where the prediction was confirmed by functional 
reconstitution.'° Thus, Mapitope predictions have 
been validated by four separate mAb:antigen co- 
crystals and one case of epitope confirmation by 
physical reconstitution. 

Here we apply this system to the analysis of a 
mAb against the major neutralization epitope in the 
RBD of spike protein to which 80R and several other 
human mAbs are directed.”*°** Our efforts to 
delineate the structure of the 80R epitope with 
overlapping peptide ELISA scans were unsuccess- 
ful, suggesting along with other published data that 
the neutralizing epitope(s) are conformational.*'””° 
This region of RBD appears to be highly immuno- 
genic and neutralizing human antibodies have been 
recovered from non-immune phage display 
libraries, human Ig transgenic mice and EBV- 
immortalized B cells from convalescent blood of a 
SARS-CoV infected individual. Other studies have 
identified two other neutralizing epitopes on spike 
protein that appear to be mostly linear, one outside 
the RBD in S1 and a few others to the S2 region; 
however, the mechanisms by which antibodies to 
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these regions lead to neutralization have not been 
elucidated.***' Although a number of methods 
exist to delineate the structure of epitopes (e.g. 
mutagenesis, docking in silico, neutralization escape 
studies and others), all ultimately produce a 
collection of candidate epitopes and there is no 
current method that provides a single solution with 
any degree of confidence.°* ** Thus, the objective of 
our analysis was to reduce the problem of 
conformational epitope mapping to a limited 
number of candidates that can be tested and 
validated experimentally. 

The predictions based on the theoretical model 
would score clusters A and B as both being the more 
likely candidates for the 80R epitope as compared to 
cluster C, when considering the behavior of the 
clusters as a function of parameter variation. By 
altering the parameters D and SI, one recognizes 
that cluster C uses fewer SSPs and of lower ST 
values (variation in parameter E, surface accessi- 
bility, had little bearing on ranking the significance 
of the clusters). Nonetheless, a dilemma remained; 
can one discriminate between clusters A and B and 
identify that cluster which might be the better 
prediction of the genuine 80R epitope? Here the 
strength of using a high resolution atomic structure 
based on empirical X-ray analysis of the antigen’s 
crystal becomes apparent; cluster A, as determined 
when using the coordinates of the crystal structure 
of the RBD becomes markedly more significant than 
cluster B. This provided us a firm basis to focus on 
cluster A as most likely being the 80R epitope. This 
furthermore illustrates that whenever possible, one 
should use the most detailed and highest resolution 
structure of the antigen as input for Mapitope 
analysis. 

There have been several attempts to map 
conformational epitopes of antibodies in the 
absence of solved crystal structures of their 
corresponding antigens. One approach for this is 
to use theoretical models of the antigen, based on 
sequence alignment with an alternative protein- 
template whose atomic structure has already been 
worked out.**”?~*” Of specific relevance is the study 
by Myers et al. in which they used a panel of affinity 
purified phage displayed peptides to assist in the 
localization of the epitope corresponding to the 
MICA3 and MICA4 mAbs that bind the major 
diabetes antigen, glutamic acid decarboxylase 
(GAD65). Their analyses identified five different 
prospective solutions which were further studied 
via mutagenesis. Here we present for the first time a 
comparative study between predictions based on a 
theoretical model of the SARS-CoV spike on the one 
hand and on the recently published crystal struc- 
ture of the SARS-CoV RBD on the other. As 
described previously, there is about 50% correspon- 
dence between the two structures, nonetheless it 
appears that this level of similarity is sufficient, as 
Mapitope analyses of the peptide panels predicted 
three clusters for each structure that shared 50-70% 
identity between them (comparing the cluster of the 
theoretical model with the crystal structure; see 
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Tables 2 and 5). This is an extremely intriguing 
result as it illustrates the potential of Mapitope 
analyses in situations where crystal structures are not 
available. The construction of theoretical models is 
almost routine where sequence homologies can be 
identified and as such all that is then necessary is to 
screen the mAb of interest against phage libraries so 
to produce a satisfactory peptide database and apply 
the algorithm for epitope prediction. 

Although empirical approaches may lead to 
successful vaccine development, rational design of 
epitope-based vaccines using proven neutralizing 
mAbs as templates for epitope discovery is an 
important and worthwhile goal that could be 
applied to other new and emerging infectious 
diseases. This approach may eliminate the 
unwanted induction of non-neutralizing and 
enhancing antibodies that have been documented 
in SARS, dengue fever®® and respiratory syncytial 
virus.”’ This property may be inherent even in 
subunit vaccines because of the proximity of these 
epitopes to the neutralizing epitopes that are 
sought. For this reverse immunological approach, 
one must be able to backtrack from the selected 
mAb to its corresponding epitope and ultimately 
reconstitute the epitope into a functional immuno- 
gen. The current study focuses on the first aspect of 
this paradigm, i.e. the discovery of a neutralizing 
epitope of the SARS-CoV protein. The 80R mAb is a 
very attractive case in point as it has been shown to 
be extremely potent in virus inactivation in vitro and 
in vivo. Analysis of the mechanism of action has led 
to the conclusion that the mAb interferes with 
virus:receptor binding; however, identifying the 
specific residues involved in 80R binding, i.e. the 
precise composition of its epitope, is still a 
formidable challenge, especially in view of the fact 
that the epitope has been shown to be confor- 
mational.”” While our studies provide a demon- 
stration of a robust computational approach that 
can be applied to neutralizing epitope discovery 
and a roadmap of how these advances may be 
applied in the future, the value of these predictions 
will ultimately be determined in functional studies 
where the reconstructed and stabilized neutralizing 
epitopes based on the cluster predictions are tested 
in vaccine studies and when the 80R:S1 protein co- 
crystal is solved. 


Materials and Methods 


Production of 80R scFv 


80R scFv were expressed and purified as described.” 
The VH and VL gene of 80R scFv were cloned into 
prokaryotic expressing vector pSynI for expression. It was 
expressed in Escherichia coli. XL1-Blue (Stratagene, La 
Jolla, CA) and purified from the periplasmic fractions by 
immobilized metal affinity chromatography. 
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Peptide libraries 


The fUSE5/15-mer, F88-4/15-mer, and F88-4/Cys1/13- 
mer phage display peptide libraries display random 
linear 13-15-mer peptides. The F88-4/Cys1/13-m23 
library is a constrained-loop library containing two 
cysteine residues within its sequence. The complexity of 
the libraries is estimated to be 2 10° for f{USE5 and 5.5 x 
10’ for F88-4/Cys1. These peptide libraries were selected 
with the mAb 80R scFv. 


Affinity selection with 80R scFv and screening for 80R 
binding clones 


10'* plaque-forming units (pfu) of phage-peptides 
prepared from each library were screened and introduced 
individually for panning into Maxisorp immunotubes 
(Nunc, Naperville, IL) coated with 10 ug of 80R scFv. 
Non-specifically absorbed phages were removed by 
intensive washings. Specific bound phages were eluted, 
neutralized, amplified and used for further selections as 
described.*°*' Randomly picked single phage clones 
were screened for specific binding to 80R scFv by ELISA 
after three rounds of panning. In brief, 96 well Maxisorp 
immuno-plates were coated with 0.5 ng /well of 80R scFv 
or a control scFv, blocked with PBS containing 4% (w/v) 
non-fat milk. Then, individual phage-peptide clones in 
phosphate-buffered saline (PBS) containing 2% non-fat 
milk were added. Specific bound phages were detected 
by adding HRP-conjugated mouse anti-His, and the 
system was developed by adding TMB substrate. 
Absorbance at 450nm was measured. Clones that 
bound to 80R scFv with A4s9 values of >1.0 were scored 
as positive, whereas negative clones gave values of <0.2. 
Unique positive clones were identified by DNA sequen- 
cing and the derived peptide sequences were used for 
Mapitope analysis. 


The Mapitope algorithm 


The Mapitope program was implemented in C+ + and 
runs on the order of a minute (on Windows XP, 1 
processor, Pentium 4 1.80 GHz, 256 KB cache machine). 
The output of Mapitope is written as a RasTop script 
which allows one to easily cut and paste into RasTop in 
order to easily view the clusters on the surface of the 
antigen color-coded from the most likely to less likely first 
five clusters as epitope predictions. 
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